Pivot and Join

Part 1: tidy data for functional programming and iteration

John Little

Duke University Libraries

Center for Data & Visualization Sciences

2023-09-19

Tidy data

Tidy data1

Wide data

Code
library(tidyverse)
library(gt)

tidyr::relig_income |> 
  gt::gt_preview() |> 
  gtExtras::gt_theme_dark()
religion <$10k $10-20k $20-30k $30-40k $40-50k $50-75k $75-100k $100-150k >150k Don't know/refused
1 Agnostic 27 34 60 81 76 137 122 109 84 96
2 Atheist 12 27 37 52 35 70 73 59 74 76
3 Buddhist 27 21 30 34 33 58 62 39 53 54
4 Catholic 418 617 732 670 638 1116 949 792 633 1489
5 Don’t know/refused 15 14 15 11 10 35 21 17 18 116
6..17
18 Unaffiliated 217 299 374 365 341 528 407 321 258 597

Tall data

Code
relig_income |> 
  pivot_longer(cols = -religion, 
               names_to = "income_category", 
               values_to = "income") |> 
  gt::gt_preview() |> 
  gtExtras::gt_theme_dark()
religion income_category income
1 Agnostic <$10k 27
2 Agnostic $10-20k 34
3 Agnostic $20-30k 60
4 Agnostic $30-40k 81
5 Agnostic $40-50k 76
6..179
180 Unaffiliated Don't know/refused 597
Code
relig_income |> 
  pivot_longer(cols = -religion, 
               names_to = "income_category", 
               values_to = "income") |> 
  mutate(religion = fct_relevel(religion, "Evangelical Prot", "Mainline Prot", "Catholic", "Unaffiliated", "Historically Black Prot")) |> 
  mutate(income_category = fct_rev(as_factor(income_category))) |>
  ggplot(aes(income, income_category)) +
  geom_col(fill = "#eee8d5") +
  facet_wrap(vars(
    fct_other(
      religion, 
      keep = c("Evangelical Prot", "Mainline Prot", "Catholic", "Unaffiliated", "Historically Black Prot")))) +
  theme(plot.background = element_rect(fill = "#002b36"),
        text = element_text(color = "#eee8d5"),
        axis.text = element_text(color = "#eee8d5"), 
        panel.background = element_rect(fill = "#002b36"),
        panel.grid = element_line(color = "#002b36"),
        strip.background = element_rect(fill = "#7b9c9f"))

Code

 

relig_income |> 
  pivot_longer(cols = -religion, names_to = "income_category") |> 
  ggplot(aes(income, income_category)) +
  geom_col() +
  facet_wrap(vars(religion))

Pivot

{ tidyr }


tidyr::pivot_wider()

tidyr::pivot_longer()

Image Credit: apreshill | CC BY 4.0 | https://github.com/apreshill/teachthat/blob/master/pivot/pivot_longer_smaller.gif]

Join

{ dplyr }

fav_ratings  |>  
  left_join(starwars, 
            by = join_by(name)) 

Two datasets

{ dplyr::starwars }

name height mass gender homeworld
1 Luke Skywalker 172 77 masculine Tatooine
2 C-3PO 167 75 masculine Tatooine
3 R2-D2 96 32 masculine Naboo
4 Darth Vader 202 136 masculine Tatooine
5 Leia Organa 150 49 feminine Alderaan
6 Owen Lars 178 120 masculine Tatooine
7 Beru Whitesun lars 165 75 feminine Tatooine
8 R5-D4 97 32 masculine Tatooine
9..86
87 Padmé Amidala 165 45 feminine Naboo

538.com

name fav_rating
1 Han Solo 610
2 Luke Skywalker 552
3 Princess Leia Organa 547
4 Anakin Skywalker 245
5 Obi Wan Kenobi 591
6 Emperor Palpatine 110
7 Darth Vader 310
8 Lando Calrissian 142
9..13
14 Yoda 605

Data gathered then transformed from
https://github.com/fivethirtyeight/data/tree/master/star-wars-survey

joined data

One data frame


fav_ratings  |>  
  left_join(starwars, 
            by = join_by(name))  
name fav_rating height mass gender homeworld
Han Solo 610 180 80.0 masculine Corellia
Luke Skywalker 552 172 77.0 masculine Tatooine
Princess Leia Organa 547 NA NA NA NA
Anakin Skywalker 245 188 84.0 masculine Tatooine
Obi Wan Kenobi 591 NA NA NA NA
Emperor Palpatine 110 NA NA NA NA
Darth Vader 310 202 136.0 masculine Tatooine
Lando Calrissian 142 177 79.0 masculine Socorro
Boba Fett 138 183 78.2 masculine Kamino
C-3P0 474 NA NA NA NA
R2 D2 562 NA NA NA NA
Jar Jar Binks 112 196 66.0 masculine Naboo
Padme Amidala 168 NA NA NA NA
Yoda 605 66 17.0 masculine NA